CLAMP - a toolkit for efficiently building customized clinical natural language processing pipelines
نویسندگان
چکیده
Existing general clinical natural language processing (NLP) systems such as MetaMap and Clinical Text Analysis and Knowledge Extraction System have been successfully applied to information extraction from clinical text. However, end users often have to customize existing systems for their individual tasks, which can require substantial NLP skills. Here we present CLAMP (Clinical Language Annotation, Modeling, and Processing), a newly developed clinical NLP toolkit that provides not only state-of-the-art NLP components, but also a user-friendly graphic user interface that can help users quickly build customized NLP pipelines for their individual applications. Our evaluation shows that the CLAMP default pipeline achieved good performance on named entity recognition and concept encoding. We also demonstrate the efficiency of the CLAMP graphic user interface in building customized, high-performance NLP pipelines with 2 use cases, extracting smoking status and lab test values. CLAMP is publicly available for research use, and we believe it is a unique asset for the clinical NLP community.
منابع مشابه
Bluima: a UIMA-based NLP Toolkit for Neuroscience
This paper describes Bluima, a natural language processing (NLP) pipeline focusing on the extraction of neuroscientific content and based on the UIMA framework. Bluima builds upon models from biomedical NLP (BioNLP) like specialized tokenizers and lemmatizers. It adds further models and tools specific to neuroscience (e.g. named entity recognizer for neuron or brain region mentions) and provide...
متن کاملIceNLP: a natural language processing toolkit for icelandic
Icelandic is a morphologically complex language, for which language technology resources are scarce. Only a few years ago, it could be stated that language technology was practically non-existent in Iceland. In this paper, we describe the development of an NLP toolkit for processing the language, the challenges faced and the decisions made during development. The current version of the toolkit ...
متن کاملAn NLP Curator (or: How I Learned to Stop Worrying and Love NLP Pipelines)
Natural Language Processing continues to grow in popularity in a range of research and commercial applications, yet managing the wide array of potential NLP components remains a difficult problem. This paper describes CURATOR, an NLP management framework designed to address some common problems and inefficiencies associated with building NLP process pipelines; and EDISON, an NLP data structure ...
متن کاملPSI-Toolkit: A Natural Language Processing Pipeline
The paper presents the main ideas and the architecture of the open source PSI-Toolkit, a set of linguistic tools being developed within a project financed by the Polish Ministry of Science and Higher Education. The toolkit is intended for experienced language engineers as well as casual users not having any technological background. The former group of users is delivered a set of libraries that...
متن کاملPipelines, Templates and Transformations: XML for Natural Language Generation
The paper discusses a number of ways in which XML can be used in natural language generation, including XML-based pipeline architectures, template-based generation with XSL templates, and tree-totree transformations. The ideas are based on practical experience in building an experimental XMLbased generation component for a spoken dialogue system. Prototype implementations using DOM, XSL and Tra...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of the American Medical Informatics Association : JAMIA
دوره شماره
صفحات -
تاریخ انتشار 2017